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Abstract 

The purpose of the present paper is twofold. First, the authors illustrate how 
displaying disattenuated correlation coefficients alongside their unadjusted counterparts 
will allow the reader to assess the impact of unreliability on each bivariate relationship. 
Second, the authors demonstrate how a proposed new “what if reliability” analysis can 
complement the conventional null hypothesis significance test (NHST) of bivariate 
relationships. Such analyses illustrate how the sample size needed to detect a 
statistically significant bivariate relationship decreases as the observed score reliability 
coefficient pertaining to the independent and/or dependent measure (theoretically) 
increases, holding all other factors constant. As such, “what if reliability” analyses will 
help researchers to interpret their results by considering the extent to which the reliability 
coefficient(s) contributed, or failed to contribute to the ability to achieve a statistically 
significant result. 
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A Proposed New “What if Reliability” Analysis 
for Assessing the Statistical Significance of Bivariate Relationships 
One of the assumptions underlying null hypothesis significance tests (NHST) is 
that all variables involved are measured without error (Myers, 1986; Onwuegbuzie & 
Daniel, in press-a). Unfortunately, when measurement errors are present, as is typically 
the case in the social and behavioral sciences, the relationships computed from the 
sample data will systematically underestimate the strength of the associations in the 
population. Indeed, indices of both statistical significance and practical significance will 
be adversely affected by errors in measurement (Onwuegbuzie, 2001). 

In the two-variable case, errors of measurement yield biased estimates of 
correlation coefficients that attenuate the true relationships. In fact, the greater the 
measurement error, the more the correlation coefficient is attenuated. Thus, knowledge 
of the error of estimate is vital. A common way of assessing reliability is via the reliability 
coefficient. Indeed, the relationship between the reliability and the standard error of 
measurement can be seen from the following formula: 

°x = a i{ r ^ F Z 

where o E is the standard error of measurement, o x is standard deviation of the full scale 
scores, and r^ = is the reliability yielded by the scale scores. Thus, the lower the score 
reliability, the larger the standard error of measurement. As such, the reliability estimate 
provides valuable information about the measurement error. Unfortunately, most 
researchers do not report reliability estimates for their own data (Henson, 2000; 
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Onwuegbuzie & Daniel, 2000, 2001, in press-b; Thompson & Vacha-Haase, 2000; 
Vacha-Haase, 1998). In fact, in studies examining whether researchers report reliability 
coefficients, the proportion of researchers who do not report reliability coefficients for 
data from their underlying sample has been found to range from 64.4% (Vacha-Haase, 
Ness, Nillsson, & Reetz, 1999) to 86.9% (Vacha-Haase, 1998). Hence, many 
researchers are not in a position to determine the extent to which measurement error 
affected the observed findings in their study. 

In an attempt to increase substantially the proportion of researchers who report 
sample-specific reliability estimates, Pedhazur and Schmelkin (1991) asserted that 
Researchers who bother at all to report reliability estimates for [scores on] the 
instruments they use frequently report only reliability estimates contained in the 
manuals of the instruments or estimates reported by other researchers. Such 
information may be useful for comparative purposes, but it is imperative to 
recognize that the relevant reliability estimate is the one obtained for the sample 
used in the [current] study under consideration, (p. 86) 

More recently, the American Psychological Association (APA) Board of Scientific Affairs, 
who convened a committee called the Task Force on Statistical Inference for the purpose 
of providing recommendations for the use of statistical methods offered the following 
suggestion for researchers: 

If a questionnaire is used to collect data, summarize the psychometric properties 
of its scores with specific regard to the way the instrument is used in a population. 
Psychometric properties include measures of validity, reliability, and any other 
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qualities affecting conclusions.... Thus, authors should provide reliability 
coefficients of the scores for the data being analyzed even when the focus of their 
research is not psychometric. Interpreting the size of observed effects requires an 
assessment of the reliability of the scores. (Wilkinson & the Task Force on 
Statistical Inference, 1999, p. 5) 

The latest version of the American Psychological Association (APA), version 5, 
provided some much-needed recommendations and stipulations for presenting 
“informationally adequate statistics” (APA, 2001, p. 23), including the reporting of effect 
sizes and confidence intervals. Clearly, the APA Task Force played an important role 
here. Unfortunately, despite the recommendations of the Task Force noted above, only 
one suggestion was given by APA (2001) regarding the delineation of sample-specific 
reliability estimates: 

For correlational analyses (e.g., multiple regression analysis, factor analysis, and 
structural equation modeling), the sample size and variance-covariance (or 
correlation) matrix are needed, accompanied by other information specific to the 
procedure used (e.g., variable means, reliabilities, hypothesized structural 
models, and other parameters)...” [emphasis and end parenthesis added] (p. 23) 
Morever, the fact that the word reliabilities was included only in parentheses gives the 
impression that the reporting of sample-specific reliability coefficients is not of primary 
importance, or even optional. Thus, it is likely that the majority of researchers will 
continue to fail to report these indices, let alone to report them consistently. Nor is 
rhetoric provided by a handful of research methodologists sufficient to reverse this trend. 
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Rather, what is needed is more compelling evidence of how information about current- 
sample specific reliability estimates can facilitate data analysis and interpretation. 

Therefore, the objective of the present paper is to provide researchers with further 
incentive for reporting reliability coefficients for their own data. Specifically, the purpose 
of the current paper is twofold. First, the authors illustrate how displaying disattenuated 
correlation coefficients alongside their unadjusted counterparts will allow the reader to 
assess the impact of unreliability on each bivariate relationship. Second, the authors 
demonstrate how a proposed new "what if reliability” analysis can complement the 
conventional NHST of bivariate relationships. Such analyses indicate how the sample 
size needed to detect a statistically significant bivariate relationship decreases as the 
observed score reliability coefficient pertaining to the independent and/or dependent 
measure (theoretically) increases, holding all other factors constant. As such, it is 
contended that “what if reliability” analyses will help researchers to interpret their results 
by considering the extent to which the reliability of scores on one or both variables affect 
the statistical significance of the bivariate correlation coefficient. 

Correction for Attenuation 

In every empirical study, researchers strive to utilize measures that yield scores 
that are as error-free as possible. While this goal often is met for empirical research 
conducted in the physical and life sciences, this is seldom the case when dealing with 
social, behavioral, psychological, sociological, and educational constructs. Unfortunately, 
when measurement errors prevail, any relationship that is derived from the underlying 
data will systematically underestimate the strength of the association in the population 
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(Huck, 2000). In other words, errors of measurement produce negatively biased 
estimates of the observed relationship that attenuate the true relationship. As noted 
earlier, the greater the measurement error, the more the association is attenuated. 

Researchers and authors of statistics textbooks alike have routinely 
acknowledged that statistical power is affected by the following three components: (a) 
size of the sample, (b) level of statistical significance (i.e., Type I error probability allowed 
for), and (c) effect size. However, as admonished by Onwuegbuzie and Daniel (2000), 
the role of score reliability is often neglected. Even APA (2001) failed to mention the 
influence of score reliability on statistical power, as illustrated in the following statement: 
Take seriously the statistical power consideration associated with your tests of 
hypotheses. Such considerations relate to the likelihood of correctly rejecting the 
tested hypotheses, given a particular alpha level, effect size, and sample size. In 
that regard, you should routinely provide evidence that your study has sufficient 
power to detect effects of substantive interest (see Cohen, 1988). You should be 
similarly aware of the role played by sample size in cases in which not rejecting 
the null hypothesis is desirable (i.e., when you wish to argue that there are no 
differences).. .(p. 25) 

The way in which score reliability can affect statistical power recently was 
illustrated by Onwuegbuzie and Daniel (2000), who demonstrated that subgroups with 
scores that generate markedly different reliability estimates can seriously reduce 
statistical power, even when the full-sample reliability coefficients are adequate. 
Specifically, Onwuegbuzie and Daniel produced two datasets that consisted of scores 
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pertaining to two sub-samples. The first dataset, which they called the invariant-reliability 
dataset (p. 23), was constructed such that scores of both subgroups yielded adequate 
classical theory alpha reliability coefficients; namely, .89, and .71 (Nunnally & Bernstein, 
1994); also, the full-sample reliability estimate of .83 was adequate. Conversely, the 
second dataset, which the authors termed the variant-reliability dataset (p. 23), was 
generated such that although scores from the full sample yielded an adequate reliability 
coefficient (i.e., .79), only the first subgroup generated an adequate reliability estimate 
(i.e., .89), whereas scores from the second subgroup yielded a low reliability coefficient 
(i.e., .66). Onwuegbuzie and Daniel found that the invariant-reliability dataset, containing 
adequate subgroup reliability estimates yielded a statistically significant difference 
between the two groups. On the other hand, the variant-reliability dataset yielded no 
statistically significant difference, even though the respective group means in the variant- 
reliability dataset were identical to those in the invariant-reliability dataset. 

Onwuegbuzie and Daniel (2000) recommended that researchers not only report 
reliability coefficients for the full sample at hand, but also for each subgroup. Indeed, 
providing sub-sample reliability estimates alongside those for the full sample is consistent 
with APA’s (2001) following recommendations: (a) “report the data in sufficient detail to 
justify the conclusions” (p. 20); and (b) “When reporting inferential statistics, include 
sufficient information to help the reader fully understand the analyses conducted and 
possible explanations for the outcomes of these analyses” (p. 23). 

Because low reliability coefficients tend to reduce statistical power, in cases when 
the null hypothesis is not rejected, and one or more of the measures generate scores 
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that have a low reliability coefficient, the researcher cannot be certain whether the 
statistically nonsignificant finding represents a real outcome or a statistical artifact. This is 
true in both the bivariate case and the multivariate case. With respect to the former, 
some theorists recommend use of sample-specific reliability coefficients to adjust 
correlation coefficients to account for the estimated amount of unreliability. These 
analysts use a correction-for-attenuation formula which yields an adjusted/disattenuated 
correlation coefficient that is always higher than the uncorrected, raw r (Huck, 2000), with 
the exception of the utopian case of both variables being measured with absolutely no 
error. 

Three correction-for-attenuation formulae are commonly used. Some theorists 
utilize Spearman’s (1910) double correction formula: 

r 

p = 

where p xy is the corrected correlation coefficient, r xy is the obtained sample correlation 
coefficient, r xx is the reliability of scores yielded by the measure of the independent 
variable, and r n is the reliability of scores generated by the measure of the dependent 
variable. This formula corrects for unreliability of scores generated by measures of both 
the independent and dependent variables. This is how the formula works. Suppose the 
correlation coefficient between an independent variable and a dependent variable was 
.30, representing a moderate relationship (Cohen, 1988). Let us also suppose that the 
dependent measure yielded scores that produced a reliability estimate of .80, whereas 
the independent measure generated scores that resulted in a reliability coefficient of .70. 
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Using these values, the double correction formula would lead to a corrected correlation 
coefficient of 



•/TTo /Too 

The adjusted coefficient indicates that if scores pertaining to the independent and 
dependent measures had generated perfect reliability (i.e., no measurement error), 
holding all other data constant, the correlation coefficient would have been .40, which is 
33.3% larger. This illustrates how unreliability in both measures adversely affects 
statistical power and effect sizes of bivariate relationships. 

Other measurement theorists advocate the single correction formula 



p - = 



r .. 






or 



p- = 



r .. 



F 



where the first single correction formula corrects for unreliability in scores generated by 
measures of the dependent variable only (i.e., dependent variable-based corrected 
correlation coefficient), and the second single correction formula adjusts for unreliability 
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in scores yielded by measures of the independent variable only (i.e., independent 
variable-based corrected correlation coefficient). These single correction formulae are 
useful in cases in which the reliability of scores on one of the variables is unknown to the 
researcher (e.g., when using data from standardized achievement tests in which the raw 
scores are not reported by the agency administering the test). If the correlation 
coefficient was again .3 and the reliability estimate for the dependent variable was .8, 
with the reliability measure of the independent variable being unknown, then the 
dependent variable based corrected correlation coefficient would be 



.30 
/T 80 



0.34 



This represents a 13.3% increase in the reliability index. Further, if the correlation 
coefficient was again .3 and the reliability estimate for the independent variable was .7, 
with the reliability measure of the dependent variable being unknown, then the 
independent variable based corrected correlation coefficient would be 



/TTO 

This represents a 20.0% increase in the reliability index. A comparison of the three 
corrected correlation coefficients shows that the double correction formula yields a larger 
correction than do either of the single correction formulae. Indeed, this is always the case 
except when the independent or dependent variable is measured without any error (i.e., 
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perfect reliability), in which case the double correction formula will be identical to one of 
the single correction formulae. As such, both single correction formulae are equivalent to 
the double correction formula, with the unknown reliability estimate set equal to one. 
Because it is virtually impossible for score reliability to be equal to unity in the social and 
behavioral sciences, disattenuated correlation coefficients represent an underestimate. 
Simply put, it is better to utilize the double correction formula than either of the two single 
correction formulae because the former uses the full reliability information-providing 
justification for reporting score reliability for all measures, whenever possible. 

Table 1 presents corrected correlation values for an observed correlation 
coefficient of .30 across various combinations of score reliability estimates for the 
independent and dependent measures. Researchers could use this table after computing 
correlation coefficients. As noted above, if the score reliability estimate for one of the 
variables is unknown then the researcher should set it equal to 1.00 in Table 1. Indeed, 
this table could be used to produce a table of corrected intercorrelations. 



Insert Table 1 about here 



It should be noted that although it is theoretically impossible to compute a 
reliability coefficient that exceeds 1 .00, it is empirically feasible using any of the three 
correction formulae above to compute such a coefficient. For example, if r xy = .6, r n - .7, 
and r xx = .5, then the corrected reliability coefficient, p ^ will equal 1.01. Although 
meaningless, Spearman (1910) attributed this anomalous coefficient to sampling error in 




13 



Indices of Score Reliability 1 3 



both variables, whereas Johnson (1944) contended that this occurrence was due to an 
inadequate sample size. Consistent with Johnson’s assertions, Nunnally (1978) reported 
that such greater-than-unity reliability estimates will be produced when sample sizes are 
less than 300. Whatever the explanation for greater-than-unity reliability coefficients, all 
corrected reliability estimates greater than 1 .00 should be truncated to unity. 

As informative as this technique can be, correcting for attenuation is an extremely 
controversial technique because it is subject to misapplication and misinterpretation 
(Muchinsky, 1996; Onwuegbuzie & Daniel, in press-b). First, some researchers 
incorrectly claim that the method of correcting for attenuation improves the predictive 
accuracy of measures; yet, this can never occur. That is, applying a correction for 
attenuation cannot render a measure more predictive than it actually is (Muchinsky, 
1996). 

Second, a misapplication stems from the practice by many meta-analysts of 
disattenuating findings from individual studies before aggregating them into a composite 
score (i.e., effect size measure). According to Muchinsky (1996), this practice is relatively 
common. Unfortunately, because researchers are not consistent in the procedures that 
they employ to estimate reliability (i.e., internal consistency, test-retest, and parallel 
forms), these meta-analysts end up violating a major assumption of classical 
measurement theory that invalidates the interchangeability of different types of reliability 
(Cronbach, 1947). Alternatively stated, aggregating indices that have been disattenuated 
using different measures of reliability seriously distort the validity of the resultant effect 
size estimates. Moreover, as noted by Onwuegbuzie and Daniel (in press-b), because 
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many researchers presently do not report sample-specific reliability coefficients in their 
reports, meta-analysts who choose to disattenuate the correlation coefficients of original 
researchers are left with missing data. Removing studies from the meta analysis for 
which no reliability estimates are provided or using arbitrary (e.g., .60; Schmidt, Hunter, & 
Urry, 1976) imputed estimates or published values for missing reliability coefficients 
leads to biased composite effect sizes even when score reliability is high (Onwuegbuzie 
& Daniel, in press-b). The finding that the proportion of researchers who do not report 
reliability coefficients for their underlying sample ranges from 64.4% to 86.9% (Vacha- 
Haase, 1998; Vacha-Haase, Ness, Nillsson, & Reetz, 1999) suggests that the amount of 
bias that prevails for meta analyses conducted by those who disattenuate correlation 
coefficients typically is large. 

A third area of contention is which type of reliability to attenuate. Whereas 
Johnson (1950) recommended that test-retest reliability coefficients be utilized in 
correction formulas, Guilford (1954) and others advocated the use of coefficients of 
equivalence. However, the majority of researchers (e.g., Nunnally & Bernstein, 1994) 
have promoted the use of internal consistency estimates. Indeed, use of internal 
consistency appears to be justified because it is the most commonly reported index, and 
thus should be utilized unless temporal instability is considered as a source of error, in 
which case the coefficient of stability should be employed (Lord & Novick, 1968). 

Perhaps the biggest criticism of the use of disattenuated correlation coefficients is 
that it may mislead the researcher “into believing that a better correlation has been found 
than that actually evidenced in the available data” (Nunnally, 1978, p. 237). However, 
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this criticism does not necessarily mean that corrected correlations should not be used. 
Rather, it suggests that researchers should be extremely careful when interpreting them. 
Strictly speaking, correction for attenuation represents an estimate of how large the 
correlation would be if the two underlying variables yielded scores that were perfectly 
reliable--no more, no less. Indeed, it could be argued that the correction for attenuation 
label is somewhat misleading because this class of formula represents an upper bound 
for the observed correlation coefficient rather than a correction. As such, corrected 
correlation coefficients are theoretical rather than actual values. However, as long as 
researchers bear this mind, disattenuated correlation coefficientss can provide useful 
information. For example, presenting both the uncorrected and corrected correlation 
coefficients will allow the reader to assess the impact of unreliability on each bivariate 
relationship (Onwuegbuzie & Daniel, in press-b). An even more useful application of 
correction for attenuation formulae are that they allow the researcher to conduct a “what 
if reliability” analysis. It is to this that we now turn. 

Proposed “What If Reliability” Analysis 
Thompson (1989a, 1989b) proposed a “what if method to facilitate the 
interpretation of null hypothesis significance tests in a sample size context. Specifically, 
the “what if’ procedure helps the researcher to determine the extent to which the sample 
size, as opposed to the effect size, was responsible for the observed statistically 
significant or non-statistically significant finding. Moreover, for a given statistically 
significant finding, this technique allows the analyst to specify how large a sample is 
needed to obtain statistical significance for an observed finding in cases in which the null 
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hypothesis is not rejected, as well as how small a sample is needed before an observed 
statistically significant result is no longer statistically significant. Thompson’s (1989a, 
1989b) method utilizes the uncorrected effect sizes, which are in the metric of the 
sample. Kieffer and Thompson (1999) criticized use of the “what if’ analyses with 
uncorrected effect sizes because such effect sizes do not take into account the fact “that 
the amount of sampling error (and therefore the positive bias in the “uncorrected” effect 
size) will change as sample size itself changes” (Kieffer & Thompson, 1999, p. 11). In 
their award-winning work, Kiefer and Thompson (1999) subsequently proposed a “what 
if” analysis that employs the corrected estimate of the population effect size as the 
metric for examining the influence of sample size on observed p-values. 

The proposed "what if reliability" analytic method utilizes a similar logic to that of 
Kieffer and Thompson (1999), except that it focuses on the reliability coefficient as the 
effect size index. The “what if reliability” analysis utilizes the fact that as the reliability 
estimate for the scores on the dependent and/or the independent measure decreases, 
the difference between the uncorrected and corrected correlation coefficient increases, 
and, subsequently, the sample size needed for statistical significance of the correlation 
coefficient decreases. For observed correlation coefficients that are statistically 
significant, the “what if reliability” analysis determines how small the sample size can be 
before the measurement-error-free correlation is no longer statistically significant, as a 
function of the score reliability of the independent and dependent measures. For 
observed correlation coefficients that are not statistically significant, the “what if reliability” 
analysis enables the researcher either (a) to ascertain how small the sample size 
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theoretically can be before the measurement-error-free correlation coefficient is no longer 
statistically significant, as a function of the score reliability of the independent and 
dependent measures; or (b) in the case where the corrected correlation coefficient is still 
not statistically significant, to delineate the theoretical upper bound for the observed 
correlation coefficient (i.e., effect size), as well as the sample size needed to produce a 
statistically significant finding. The sample size needed in (b) would be larger than the 
original sample size; however, it would be smaller than that needed for the observed 
correlation coefficient to be statistically significant. 

The steps for conducting the “what if reliability” analysis are as follows. First, the 
observed correlation coefficient must be computed. Second, the score reliability (i.e., 
internal consistency) estimates pertaining to the independent and/or the dependent 
variables should be used in the double-correction formula (setting any unknown reliability 
estimate equal to 1 .00) to determine the corrected correlation coefficient. (Confidence 
intervals can be constructed around the disattenuated correlation coefficient.) Third, if 
this corrected correlation coefficient is statistically significant, then the sample size 
should be determined such that any reduction in cases would yield a statistically non- 
significant result. Conversely, if the corrected correlation coefficient is not statistically 
significant, then the sample size should be determined at which the corrected correlation 
coefficient would become statistically significant. 

Table 2 illustrates the “what if reliability” analysis for a moderate observed 
correlation coefficient (i.e., r= .3; Cohen, 1988) and as a function of the reliability 
estimate of the independent and dependent variables. (Table 3 presents the Excel 
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spreadsheet commands for conducting the what if reliability” analysis.) For example, from 
Tablel , it can be seen that if the score reliability index for both the independent and 
dependent variables are .8, a correlation coefficient of .30 would yield a corrected 
correlation of .375. From Table 2, it can be seen that this corrected correlation would be 
statistically significant with a sample size as small as 28. On the other hand, if the 
reliability coefficient for both the independent and dependent variables is .6, a correlation 
coefficient of .30 would yield a corrected correlation index of .375 (Table 1), which, in 
turn, would be statistically significant with a sample size as small as 16 (Table 2). 
Comparing these two sets of findings indicates that, holding all other factors constant, a 
reduction in the score reliability indices of both measures from .8 to .6 (i.e., 25% 
reduction) results in a smaller sample size (from 28 to 16, or a 42.9% reduction in 
number of cases) needed. Thus, the more unreliable the measure(s), the larger the 
correction to the correlation coefficient, and the less expensive the study would be in 
terms of sample size needed to obtain statistical significance, if the scores for both 
variables had yielded perfect reliability— again, holding everything else constant. 



Insert Table 2 about here 



Insert Table 3 about here 
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Discussion 

The illustrative results presented in Tables 1 and 3 make clear how unreliability 
affects statistical power. The “what if analysis demonstrates that attenuation of the 
correlation coefficient due to unreliability compels researchers to have a larger sample 
size in order to obtain a statistically significant result, in addition to diminishing the 
potential value of the observed correlation coefficient. This is particularly problematic with 
small effect sizes (i.e., correlation coefficients). 

It should be noted that the “what if reliability” analysis is theoretical in nature 
because it assumes that the exact correlation coefficient would be replicated in a future 
study with less (or more) participants. However, this technique is no more theoretical 
than when one conducts a statistical power analysis in order to determine an appropriate 
sample size. When conducting power analyses, researchers make the assumption that 
the hypothesized/desired effect size would be replicated in their future study. Confidence 
intervals, which have been correctly given elevated status by APA (2001), also represent 
theoretical values. Consequently, if it is justified to criticize the “what if reliability” analysis 
for its theoretical interpretation, then other analytic tools (e.g., power analysis, confidence 
intervals, internal replications) should be held to the same standard. In any case, the 
proposed “what if reliability” procedure helps to remind researchers the importance of 
selecting instruments that offer good potential for yielding reliable scores for their 
intended samples. 

Further, it should be noted that the “what if reliability” analysis readily extends to 
more complex families of the general linear model. For example, squaring the double 
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correction formula above would lead to a disattenuated R 2 . Thus, for example, in a 
multiple regression model, the proportion of variance explained by a regressor variable 
can be corrected for unreliability, and the role of the sample size investigated using this 
corrected index. 

The “what if examples provided in this paper show how p-values and effect sizes 
can be potentially misrepresentative of reality when the reliability context is not 
considered. This, in turn further reinforces the importance of reporting sample-specific 
reliability estimates for all variables because without these indices, a “what if reliability” 
coefficient could not be conducted, rendering it impossible for the analyst to rule out 
unreliability as a rival explanation to the observed findings. Routine reporting of reliability 
coefficients for scores on all variables within a given study would provide adequate 
evidence for interested readers to conduct “what-if analyses if so desired. 
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